In this document, I want to provide a write-up of how this model is similar to and different from the versions previously implemented. This model is meant to provide assembly of multiple sites at the same time, that may, or may not, be connected in some fashion. Along the way, I will mention some of the input and output format that I am expect as documentation. At the end, I will be presenting some preliminary results. I especially want to use this as a vehicle to think about how to analyse those results.
First, we load up the preliminary results. In this case, we are loading a system in which 34 basal species and 66 consumer species form the pool for 10 unconnected environments. The pool and interaction matrix were assembled with the default parameters from Law and Morton’s 1996 work.
library(dplyr)
Warning: package ‘dplyr’ was built under R version 4.1.1
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
library(ggplot2)
Warning: package ‘ggplot2’ was built under R version 4.1.1
library(plotly)
Warning: package ‘plotly’ was built under R version 4.1.1
Attaching package: ‘plotly’
The following object is masked from ‘package:ggplot2’:
last_plot
The following object is masked from ‘package:stats’:
filter
The following object is masked from ‘package:graphics’:
layout
library(ggfortify) # used for biplots of PCAs
Warning: package ‘ggfortify’ was built under R version 4.1.1
# library(RMTRCode2)
load(file.path(
"..", "experiments", "MNA-FirstAttempt-Result-Env10-None.RData")
)
In total, 9320 events were used in these environments, with the species and environment invasion both randomly assigned. The number of arrival and extinction events were controlled to both be half of this number. We chose this number due to the coupon collecting problem. In particular, we use the result that the probability of encountering each species is bounded: \[\text{Pr}(\text{Draws} < n \log_{e} n + c n) \rightarrow \exp(-\exp(-c)) \text{ as } n \rightarrow \infty\] where \(n\) is the number of species and \(c\) is a constant. For our purposes, we choose \(c = 5\) so that we have a probability of about \(99.3\%\) of seeing each species in each environment. In practice, we failed to observe 2 species-environment combinations. Notably, nearly every species had at least one successful invasion; 5 did not.
The initial abundance was set to be 4000 times the elimination threshold, in line with work on minimum viable populations (Traill et al. 2007). The elimination threshold is admittedly more arbitrary, since it sets an effective individual-area relationship. For this calculation, I set it to \(10^{-4}\), in line with our previous calculations. This is large enough to avoid numerical difficulties from precision, while being low enough to represent a decent sized region.
Since each population is assembling simultaneously, I chose to use exponential waiting times for the inter-species arrival and extinction times. Note that these rates are shared between species and environments, but arrival and extinction are fully independent of each other. Species and environment affected in each event was chosen uniformly at random. The question then is how to set the rate.
To set the rate in this case, I chose to set it to the largest eigenvalue magnitude of the per-environment interaction matrices. This magnitude corresponds to the strongest response we can see from the interaction matrix and, if the interaction matrix is a good approximation for the Jacobian around a stable fixed point (which is not guaranteed), indicates the characteristic time scale of the decay to equilibrium. Hence, (overall) arrival rates and (overall) extinction rates should happen on the same timescale as the (largest) dynamics in the system. Since there are 10 environments, we should then expect that 10 characteristic time scales, on average, should occur in between arrival events in the same environment.
With 10 environments, it is probably not helpful to check 10 individual abundance curves, but looking at the first one might be helpful.
obj + ggplot2::scale_y_log10() + ggplot2::guides(color = FALSE)
obj + ggplot2::scale_y_log10() + ggplot2::guides(color = FALSE)
Every vertical line is a species introduction or extinction by neutral dynamics.
Perhaps more intriguing might be some sense of the biodiversity that we have in each system. We break the abundance results up by environment, then calculate the number of non-zero abundance curves at each time point. We also calculate the Shannon entropy (reminder: higher entropy means more uncertainty which means a flatter distribution).
Diversity <- lapply(
1:result$NumEnvironments,
function(i, abund, numSpecies) {
time <- abund[, 1]
env <- abund[, 1 + 1:numSpecies + numSpecies * (i - 1)]
richness <- rowSums(env != 0)
abundSum <- rowSums(env)
entropy <- env / abundSum
entropy <- - apply(
entropy, MARGIN = 1,
FUN = function(x) {
sum(ifelse(x != 0, x * log(x), 0))
})
species <- apply(
env, MARGIN = 1,
FUN = function(x) {
toString(which(x > 0))
}
)
data.frame(Time = time,
Richness = richness,
Entropy = entropy,
Species = species,
Environment = i)
},
abund = result$Abundance,
numSpecies = (ncol(result$Abundance) - 1) / result$NumEnvironments
)
Diversity <- dplyr::bind_rows(Diversity)
Diversity <- Diversity %>% dplyr::mutate(
Evenness = Entropy / log(Richness)
)
ggplot2::ggplot(
Diversity,
ggplot2::aes(
x = Time,
y = Richness,
color = factor(Environment),
alpha = ifelse(Environment == 1, 1, 0.3)
)
) + ggplot2::geom_line(
) + ggplot2::geom_line(
data = Diversity %>% dplyr::group_by(
Time
) %>% dplyr::summarise(
Richness = mean(Richness)
),
mapping = ggplot2::aes(
x = Time,
y = Richness
),
color = "black",
inherit.aes = FALSE
) + ggplot2::guides(
alpha = "none"
) + ggplot2::scale_color_discrete(
"Environment"
)
So richness hovers around a similar regime throughout the majority of the simulation. Note in this plot that we have emphasised one environmental curve and superimposed the mean in black. We do manage to reach heights of 11 species in one environment, but these heights are shortlived. Instead, we seem to observe a (time and environment averaged) value of 4.4595443. (If we consider the first 10,000 time units as burn-in, we instead see a value of 4.5878608.)
ggplot2::ggplot(
Diversity,
ggplot2::aes(
x = Time,
y = Entropy,
color = factor(Environment),
alpha = ifelse(Environment == 1, 1, 0.3)
)
) + ggplot2::geom_line(
) + ggplot2::geom_line(
data = Diversity %>% dplyr::group_by(
Time
) %>% dplyr::summarise(
Entropy = mean(Entropy)
),
mapping = ggplot2::aes(
x = Time,
y = Entropy
),
color = "black",
inherit.aes = FALSE
) + ggplot2::guides(
alpha = "none"
) + ggplot2::scale_color_discrete(
"Environment"
)
Warning: Removed 8043 row(s) containing missing values (geom_path).
Warning: Removed 2297 row(s) containing missing values (geom_path).
Entropy has a similar, but highly erratic, behaviour. If one tries to follow one of the entropy curves, then one sees that they have fairly substantial periods of almost smooth behaviour followed by suddenly very noisy behaviour, and noise seems to be the dominant mode if one tries to examine the low alpha environment in the background. There are some easy to make predictions. Extinctions reduce the entropy in the system, as you become more certain about what remains. Analogously, arrivals increase the entropy. We can probably better see the relationship beyond these principles by plotting entropy against richness and connecting observations by environment and time.
# ggplot2::ggplot(
# Diversity %>% dplyr::filter(Environment < 4),
# ggplot2::aes(
# x = Richness,
# y = Entropy,
# group = Environment,
# color = Time
# )
# ) + ggplot2::geom_path(
# ) + ggplot2::guides(
# alpha = "none"
# )
plotly::plot_ly(data = Diversity %>% dplyr::filter(Environment < 2),
x = ~Richness, y = ~Entropy, z = ~Time, type = "scatter3d",
mode = "lines", opacity = 1, line = list(color = ~Time))
It seems quite hard to tell, but there does not appear to be any particular orientation (clockwise, counter-clockwise) or similar pattern here.
ggplot2::ggplot(
Diversity,
ggplot2::aes(
x = Time,
y = Evenness,
color = factor(Environment),
alpha = ifelse(Environment == 1, 1, 0.3)
)
) + ggplot2::geom_line(
) + ggplot2::geom_line(
data = Diversity %>% dplyr::group_by(
Time
) %>% dplyr::summarise(
Evenness = mean(Evenness)
),
mapping = ggplot2::aes(
x = Time,
y = Evenness
),
color = "black",
inherit.aes = FALSE
) + ggplot2::guides(
alpha = "none"
) + ggplot2::scale_color_discrete(
"Environment"
)
Warning: Removed 10986 row(s) containing missing values (geom_path).
Warning: Removed 2831 row(s) containing missing values (geom_path).
Evenness helps highlight that this is a high variance process but with a relatively constrained mean.
We can, of course, flip the idea on its head. Instead of examining the diversity of species within environments, we can look at the diversity of environments occupied by species. Since very few species end up occupying environments, we just look at richness. Unfortunately, this is quite a memory exhaustive task.
EnvDiversity <- lapply(
1:((ncol(result$Abundance) - 1) / result$NumEnvironments),
function(i, abund, numSpecies) {
time <- abund[, 1]
env <- abund[, 1 + i + numSpecies * (1:result$NumEnvironments - 1)]
richness <- rowSums(env != 0)
abundSum <- rowSums(env)
environments <- apply(
env, MARGIN = 1,
FUN = function(x) {
toString(which(x > 0))
}
)
data.frame(Time = time,
Richness = richness,
Abundance = abundSum,
Species = i,
Environments = environments)
},
abund = result$Abundance,
numSpecies = (ncol(result$Abundance) - 1) / result$NumEnvironments
)
EnvDiversity <- dplyr::bind_rows(EnvDiversity)
ggplot2::ggplot(
EnvDiversity %>% dplyr::filter(Richness > 1),
aes(x = Time, y = Richness, color = Species)
) + geom_point(
alpha = 0.01, size = 3
) + guides(
color = "none"
)
One immediately interesting trend here is that very few species are present across more than 5 environments at a given time. Indeed, only 12, 14, 22, 30 are ever present in more than 5 environments at once. We can also examine how long these periods occur for by species by tabulation. We block times so that entries are all of the same unit length, and round the average richness during the given time unit.
with(EnvDiversity %>% mutate(
Time = floor(Time)
) %>% group_by(
Time, Species
) %>% summarise(
Richness = round(mean(Richness)),
.groups = "drop"
),
table(Species, Richness))
Richness
Species 0 1 2 3 4 5 6 7
1 65961 14392 440 0 0 0 0 0
2 34771 31380 11353 3289 0 0 0 0
3 63201 13004 1767 2821 0 0 0 0
4 46359 21083 11443 1619 289 0 0 0
5 30641 31555 16546 1341 710 0 0 0
6 72058 8735 0 0 0 0 0 0
7 75519 5274 0 0 0 0 0 0
8 64756 15510 527 0 0 0 0 0
9 77914 2879 0 0 0 0 0 0
10 17812 38680 20196 4105 0 0 0 0
11 52660 24897 3236 0 0 0 0 0
12 9668 2677 9779 26641 18598 9921 2296 1213
13 64628 15714 451 0 0 0 0 0
14 12091 21311 14559 16445 15150 931 306 0
15 68562 10733 1498 0 0 0 0 0
16 69685 10908 200 0 0 0 0 0
17 30631 33715 15508 939 0 0 0 0
18 69209 11326 258 0 0 0 0 0
19 62453 17604 736 0 0 0 0 0
20 74863 3687 1850 393 0 0 0 0
21 47899 25705 4187 3002 0 0 0 0
22 3179 13471 20153 21767 16741 5308 174 0
23 65132 14944 717 0 0 0 0 0
24 65788 12316 2689 0 0 0 0 0
25 66081 11111 3601 0 0 0 0 0
26 41646 19406 19205 536 0 0 0 0
27 48803 20446 10283 1261 0 0 0 0
28 68979 11814 0 0 0 0 0 0
29 34625 26990 17051 2127 0 0 0 0
30 5560 19062 20684 11363 11600 7960 3304 1260
31 65010 15673 110 0 0 0 0 0
32 19323 26016 29600 5854 0 0 0 0
33 15126 23401 24348 10061 7857 0 0 0
34 76427 4366 0 0 0 0 0 0
35 73824 6084 885 0 0 0 0 0
36 71044 9749 0 0 0 0 0 0
37 38197 22367 15666 4486 77 0 0 0
38 28748 35323 16722 0 0 0 0 0
39 77005 3788 0 0 0 0 0 0
40 49949 27132 3712 0 0 0 0 0
41 79837 956 0 0 0 0 0 0
42 78309 2484 0 0 0 0 0 0
43 32813 23889 17055 6664 372 0 0 0
44 39693 23982 15556 1562 0 0 0 0
45 80793 0 0 0 0 0 0 0
46 48455 27672 4666 0 0 0 0 0
47 12049 36783 14830 15180 1951 0 0 0
48 63147 16984 662 0 0 0 0 0
49 73123 7670 0 0 0 0 0 0
50 79858 935 0 0 0 0 0 0
51 80254 539 0 0 0 0 0 0
52 73942 6851 0 0 0 0 0 0
53 77578 3215 0 0 0 0 0 0
54 76396 4397 0 0 0 0 0 0
55 55211 23955 1627 0 0 0 0 0
56 72559 8234 0 0 0 0 0 0
57 57438 22648 707 0 0 0 0 0
58 78987 1806 0 0 0 0 0 0
59 35310 37944 7539 0 0 0 0 0
60 67875 12918 0 0 0 0 0 0
61 52220 24754 3819 0 0 0 0 0
62 62514 16665 1614 0 0 0 0 0
63 46928 31448 2417 0 0 0 0 0
64 68971 11215 607 0 0 0 0 0
65 80793 0 0 0 0 0 0 0
66 67126 12564 1103 0 0 0 0 0
67 37877 27042 11836 4038 0 0 0 0
68 51133 27052 2205 403 0 0 0 0
69 61785 16497 2511 0 0 0 0 0
70 73226 7359 208 0 0 0 0 0
71 22115 22511 20022 14177 1968 0 0 0
72 10056 26079 33451 7593 3614 0 0 0
73 73693 7100 0 0 0 0 0 0
74 80793 0 0 0 0 0 0 0
75 77528 3007 258 0 0 0 0 0
76 34898 34767 10611 517 0 0 0 0
77 54795 21256 4119 623 0 0 0 0
78 57551 21264 1978 0 0 0 0 0
79 46958 28434 5401 0 0 0 0 0
80 46392 30479 3922 0 0 0 0 0
81 80793 0 0 0 0 0 0 0
82 72152 8641 0 0 0 0 0 0
83 57727 21903 1163 0 0 0 0 0
84 73059 7734 0 0 0 0 0 0
85 78106 2687 0 0 0 0 0 0
86 64837 15956 0 0 0 0 0 0
87 75894 4487 412 0 0 0 0 0
88 77086 3707 0 0 0 0 0 0
89 48322 32471 0 0 0 0 0 0
90 80793 0 0 0 0 0 0 0
91 36640 30722 11104 1868 459 0 0 0
92 54507 24631 1655 0 0 0 0 0
93 73941 6852 0 0 0 0 0 0
94 69270 10409 1114 0 0 0 0 0
95 43208 20297 14643 2645 0 0 0 0
96 65012 15781 0 0 0 0 0 0
97 75424 5369 0 0 0 0 0 0
98 65965 13006 1822 0 0 0 0 0
99 78627 2166 0 0 0 0 0 0
100 52399 15984 9607 2803 0 0 0 0
(For the desktop, the amount of data we generated is too much, so we need to reduce the amount of data we use. To do so we average over time blocks, here of length 100.)
AveragedAbundance <- result$Abundance %>% data.frame(
) %>% dplyr::mutate(
time = floor(time/100)*100
) %>% dplyr::group_by(
time
) %>% dplyr::summarise(
dplyr::across(.fns = ~ mean(.x))
)
We can perform a PCA and see if are data can be summarised by a small number of dimensions. As we shall see, constraining to the first 25 principal components does not harm the system much.
PCA <- prcomp(AveragedAbundance %>% dplyr::select_if(~ any(. > 0)),
center = TRUE, scale. = TRUE, rank. = 25)
There is not actually a lot of dependence within the system it appears. Consider the amount of variance explained by the first six principal components (ordered, as is tradition, by amount of variation explained).
head(summary(PCA)$importance[2, ])
PC1 PC2 PC3 PC4 PC5 PC6
0.03749 0.03273 0.03095 0.02751 0.02501 0.02482
The amount of explained variation is 0.99994. The traditional biplot follows, but despite the seeming presence of patterns, so little of the variance is explained that is probably not worth further examination.
ggplot2::autoplot(PCA, loadings = TRUE,
data = AveragedAbundance %>% dplyr::select_if(~ any(. > 0)),
colour = "time")
We can try again with presence-absence data instead.
AveragedPA <- result$Abundance %>% data.frame(
) %>% dplyr::mutate(
time = floor(time/100)*100
) %>% dplyr::mutate(
dplyr::across(.cols = !time, .fns = ~ .x > 0)
) %>% dplyr::group_by(
time
) %>% dplyr::summarise(
dplyr::across(.fns = ~ mean(.x))
)
PCAPA <- prcomp(AveragedPA %>% dplyr::select_if(~ any(. > 0)),
center = TRUE, scale. = TRUE, rank. = 25)
head(summary(PCAPA)$importance[2, ])
PC1 PC2 PC3 PC4 PC5 PC6
0.05538 0.04585 0.04484 0.03901 0.03578 0.03410
So it is a bit better, but not by much.
ggplot2::autoplot(PCAPA, loadings = TRUE,
data = AveragedPA %>% dplyr::select_if(~ any(. > 0)),
colour = "time")
I should note that this is not a failure of the method persay. What this says to me is that dimension reduction of the system cannot reduce the system to a human-readable set of descriptors. The system is still being reduced (from 100 Species * 10 Environments to 25 Components to cover 0.99994 of the variation). My initial hypothesis for when the systems are coupled is that, as coupling increases, the number of principal components should decrease, since one system’s changes will be better predictors of the next system’s changes. (An interesting related question; do the number of principal components correlate with the amount of biodiversity in the system?
ifrm(AveragedAbundance)
ifrm(AveragedPA)
ifrm(PCA)
ifrm(PCAPA)
Since trying to reduce the system dimensionally does not work (well enough) a next attempt might be to consider how the pair-wise beta diversity changes over the course of the simulation. Initial problems include that there are quite a few measures of beta diversity as well as that the (square of the) number of environments will determine how many entries we need to consider.
One perhaps interesting idea is to try to group the environments by cluster by considering the abundances (or presence-absence) of the species present as traits. The question then is whether the environments have a tendency to attract to specific points or if they wander around each other without relation. (The latter is more neutral.) If they appear to be convergent (which so far would seem to disagree mostly with the alpha diversity analyses), then that would imply that dynamics determine the majority of the system’s fate, while if they instead wander more randomly, then they would appear to be dominated by the neutral mechanisms. (Of course, this is affected by the rate of the neutral mechanisms, so it exists on a definite continuum.) (Note, of course, that cluster analysis usually includes something like PCA, so we would not necessarily expect a different result.)
An additional idea that might be worth an initial look over is whether we can create the subspace of the graph explored by the simulation set. This is probably of the lowest priority of the things described above, but this is of mathematical interest and does have some relationship to diversity measures. Such a map would usually be visualised with columns representing richness and edges representing moves along the system. We can colour the edges to represent whether they were via arrival, dynamic extinction, or neutral extinction as well.